Running head: INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 1 The Role of Intermediate Productions and Listener Expectations on the Perception of Children's Speech
نویسندگان
چکیده
Purpose: This paper examined whether naïve listeners could perceive phonetic detail in children's productions of /s/ and /T/, and whether their perception of /s/ and /θ/ could be biased by their belief about the child's overall speech-production ability. Method: In Experiment 1, listeners provided judgments of children's productions of /s/ and /T/ using a visual analog scale (VAS). In Experiment 2, different listeners provided 'correct' and 'incorrect' judgments of the same tokens in a task in which they were led to believe that some children were older and had more-accurate speech, and others were younger and had lessaccurate speech. For Experiment 1, linear regression modeling was used to determine the relationship between VAS responses and psychoacoustic characteristics of the stimuli. For Experiment 2, within-subjects comparisons of accuracy judgments for the two conditions were conducted. Results: In Experiment 1, VAS judgments showed that listeners were able to perceive fine phonetic detail in children's productions, including differences between correct productions and clear substitutions. In Experiment 2, listener bias was found to have a small influence on listener judgments. Conclusions: Naïve listeners are able to perceive fine phonetic detail in children's speech. Moreover, this perception is relatively impervious to bias. INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 3 Children learn to speak like adults in a remarkably short period of time. Numerous crosssectional and longitudinal studies have shown that by the age of only 5 or 6 years, children produce most or all of the sounds of their language correctly, as judged by phonetic transcriptions made by experienced transcribers (e.g., Smit, Hand, Freilinger, Bernthal, & Bird, 1990). From the first vocalizations to the point at which children's productions are transcribed as completely accurate, children are learning contrasts. The earliest contrasts that children learn may be as simple as the contrast between different syllable shapes. As children later produce more adult-like speech, they are able to produce even such fine-grained contrasts as the contrast between two highly similar sounds Just as is the case with many other aspects of language acquisition, the acquisition of contrast is gradual. The mechanics of this gradual change, however, can be understood differently depending on the how the acquisition of contrast is measured. Many large-scale studies of speech-sound acquisition (e.g., Sander, 1972; Smit et al., 1990; Templin, 1957) use phonetic transcription as a basis for assessing the development of contrast. Incorrect productions are generally classified as deletions, substitutions, or distortions of target consonants. Errors are often grouped together as a system of 'phonological processes' (e.g., Stampe, 1979) that simplify adult forms. Consider a hypothetical child who is transcribed as producing [t] for /k/ and [d] for /g/ errors. In this approach, the child would be characterized as having fronting errors, which arise because he or she has not yet developed a contrast between velar and alveolar stops. The transcribed errors are taken to be substantively equivalent to the production of the same sounds in correct words (i.e., the [t] in key is taken to be equivalent to the [t] in tea). From this perspective, the transition from consistent [t] for [k] production, to inconsistent production of the correct phonemes across different words and phonetics contexts, to correct production in all INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 4 target environments would be evidence of gradual acquisition. Thus, within this view, gradual change refers to the gradually increasing percentage of correctly produced phonemes across the lexicon, as the phonological categories of the language are mastered. A second way of thinking about gradual acquisition is to focus on a finer-grained level of detail, specifically, the gradual acoustic or articulatory differentiation of similar sounds (see Hewlett & Waters, 2004 for a review). Consider again a child producing what is transcribed as a [t] for target /k/. Detailed articulatory and acoustic studies would examine developmental changes in the articulatory-acoustic differentiation of target /t/ from target /k/. As compared to studies that characterize phonological development from transcribed samples, articulatoryacoustic research suggests that children's development of contrast progresses gradually from productions of, for example, /t/ and /k/, that are undifferentiated from one another to productions that are robustly differentiated. As a consequence, children's developmental paths include points during which they produce "intermediate productions." These are productions with acousticarticulatory properties that are intermediate between a target phoneme and the phoneme associated with the error. The [t] that a child with an apparent fronting error produces in the word key may be substantially different from the [t] in the word tea. In addition, some productions of the initial sound in key may sound as if they are in between a /t/ and a /k/. Thus, when thinking about gradient phonological learning from this perspective, the focus is on the acquisition of contrast at the level of the speech-sound category, rather than on the acquisition of contrast at the level of the lexicon. Support for gradient change in articulatory-acoustic learning can be found in literature describing covert contrast. Covert contrast occurs when significant acoustic differences are present between two phonemes in a child's speech, but both phonemes are transcribed with the INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 5 same symbol. Because both variants fall within a single adult perceptual category, transcribers perceive the two variants as the same phoneme. Covert contrast has been found in the speech of typically developing children and children with phonological disorders (e.g., Baum & McNutt, 1990; Forrest, Weismer, Elbert, & Dinnsen, 1994; Forrest, Weismer, Hodge, Dinnsen, & Elbert, 1990; Hewlett, 1988; Li, Edwards & Beckman, 2009; Macken & Barton, 1980; Scobbie, Gibbon, Hardcastle, & Fletcher, 2000). One of the earliest studies of covert contrast was by Macken and Barton (1980). Macken and Barton examined four children's development of the stop consonant voicing contrast (measuring voice onset time [VOT)) in word-initial stop consonants. Macken and Barton found that before the children acquired an adult-like VOT contrast, they went through two stages. Initially, the children had no VOT differences between target voiced and voiceless stops. Next, the children went through a phase in which they produced target voiceless stops with longer VOTs than voiced stops. However, most of the productions had VOTs that fell into the adult range for voiced stops. As a result, the children's voiceless stops were perceived as voiced stops. That is, children were perceived as substituting voiced stops for target voiceless stops. Other studies have also found evidence for covert contrast, not only for voicing contrasts, but also for contrasts involving place of articulation for both stops and fricatives (Baum & McNutt, 1990; Forrest et al., 1990, 1994; Gierut & Dinnsen, 1986; Maxwell & Weismer, 1982). For example, Baum and McNutt (1990) compared children's correct productions of /θ/ with correct productions of /s/ and frontal misarticulations of /s/. Although frontal misarticulation of /s/ are commonly described as a substitution of /θ/, acoustic analyses revealed significant differences between frontal misarticulations and correct productions of both /s/ and /θ/. Recently, Kong (2009) provided further evidence of the gradient nature of speech-sound acquisition in an analysis of voice onset time (VOT) in children's productions of word-initial INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 6 stop consonants. In a cross-sectional study of children aged 2through 5-years-old, Kong found that there is a large range of VOT values in children's speech, spanning values that are appropriate for adult voiced categories to those appropriate for voiceless ones. Her results demonstrate that although the VOT distributions were highly adult-like overall, there was considerable variability in the children's productions. Children's VOT values did not all fall into clearly distinguishable voiced and voiceless categories. Instead a natural continuum was formed with some productions closely approximating the prototypical adult-like VOT values and others falling intermediate between prototypical VOT values for voiced and voiceless stops. Implications of gradient change for transcription One potential reason for differing reports of gradient versus categorical change in speech sound acquisition is the type of analysis tool used to characterize children's speech sound productions. In the field of phonological development and disorders, transcription has long been the preferred tool—and, often, the only tool—to identify and characterize children's speech sounds. Broad transcription typically involves a coarse-grained denotation of the production using phonetic symbols, which are used to make binary judgments of "correct" or "incorrect." Transcription relies on two main assumptions on how speech is perceived. First, it relies on the assumption that children's productions can be parsed into a finite set of phonetic categories. Secondly, it relies on the idea that these sounds can be denoted with a set of standard phonetic symbols. Both of these assumptions plausibly reflect the biases that occur because of categorical perception. Categorical perception refers to the observation that listeners parse continuous acoustic variation in obstruent consonants (and, to a lesser degree, other consonant manners) into a discrete set of phonemes, and that subtle acoustic differences within a category are imperceptible (e.g., Liberman, Harris, Hoffman, & Griffith, 1957). If listeners perceive INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 7 children's productions categorically, then their intermediate productions would not be reflected in transcriptions. These productions will be heard as correct if they remain in the same perceptual category as the target, even if they vary acoustically from the prototypical adult form. Productions will be heard as clear substitutions only if their acoustic characteristics are consistent with another perceptual category. Macken and Barton's (1980) findings illustrate this point. Using transcription alone, children's productions during the second phase of development would likely be transcribed as a substitution error and denoted with the phonetic symbol [b], [d], or [g]. Labeling the production as a substitution error does provide a coarse-grained description of the child's production. However, more fine-grained information is lost—namely the fact that the children were, in fact, making a systematic contrast between voiced and voiceless stops. This highlights the limitations that are imposed when phonetic transcription is used as the sole method for denoting children's speech productions. When children's productions vary subtly in acoustic-phonetic properties from a prototypical adult form, the use of transcription may obscure potentially important information, such as that the children are capable of producing a subphonemic contrast between two phonemes. In other words, children may perceive that two phonemes are different and have begun to produce them differently at a subphonemic level, but are simply not yet able to consistently produce an adult-like contrast. This distinction has important implications for the assessment and treatment of children with disorders in speech-sound production, as it has been shown that children who exhibit covert contrast progress more quickly in therapy than children without covert contrast (Tyler, Figurski & Langdale, 1993). For this reason, it has been suggested that acoustic analysis be used to supplement transcription (Kent, 1996). Certainly, acoustic analysis can provide a wealth of information INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 8 about children's speech sounds. However, it is also extremely time-consuming, and, as many clinicians point out, it is often impractical in clinical practice for this reason. It would also likely be impossible to develop adequate norms solely using acoustic analysis, as acoustic variation relates both to the attainment of adult-like speech motor control as well as the development of an adult-sized and shaped vocal tract. In addition, transcription can be done "on the fly" as a child speaks. This allows clinicians to provide immediate feedback to children during treatment. This is not a possibility with manual acoustic analysis, at least with current technology. The question then arises: how can we accurately perceive and differentiate intermediate productions from targets and substitutions in the clinical assessment of children's speech? Is it possible for transcription to be used in such a way that intermediate productions are reliably identified and important subtle cues distinguishing productions are not lost? The answer to this question depends in part on listeners' ability to perceive subphonemic variation. If perception were strictly categorical (i.e., if listeners were unable to perceive subtle subphonemic differences, regardless of the perception task), it would not seem possible to perceive and denote intermediate productions during transcription. However, there is significant evidence that listeners are able to perceive a wide range of subphonemic variation in speech when given the appropriate task, and that the apparent inability to do so in some earlier studies was a consequence of the method used to assess perception. In tasks that elicit categorical judgments, listeners perceive speech sounds categorically, with little attention to subtle, subphonemic acoustic differences. Other speech perception tasks, however, do not ask listeners to respond categorically, and given these tasks, listeners are able to detect subphonemic detail. A number of studies have shown that listeners can perceive subtle within-category acoustic differences for obstruents in INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 9 certain tasks (e.g., McMurray, Tanenhaus, & Aslin, 2002; Carney, Widin, & Viemeister, 1977; Pisoni & Tash, 1974). The perception of subphonemic detail appears to affect perception and recognition at multiple levels of abstraction beyond mere sensory perception. For example, MacMurry (2002) found that subphonemic acoustic differences were involved in patterns of lexical activation in a minimal pair word discrimination task. These studies have primarily relied on synthetic speech, varying along the VOT dimension, as stimuli. There is relatively little research on adults' perception of the natural 'continua' that are present in children's speech as a consequence of their intermediate productions. The small body of research that has addressed this topic has focused on important clinical consequences of this variation. For example, research suggests that transcribers are less reliable in identifying some misarticulated speech sounds as compared with correctly produced sounds (Pye, Wilcox, & Siren, 1988). Although Pye et al. (1988) do not specifically comment on the nature of these misarticulated speech sounds, it is possible that one reason for decreased reliability is that these speech errors reflected intermediate productions that did not clearly fall into a single adult perceptual category. Other researchers have established that some adult listeners are able to distinguish between correct productions of /r/ and productions of /r/ that are intermediate between /r/ and /w/ in synthesized child speech (Sharf, Ohde, & Lehman,1988; Wolfe, Martin, Borton, & Youngblood, 2003). To our knowledge, no research has been done to address how adults perceive naturally occurring, within-category variation in the obstruent productions of young children. Given that listeners appear able to perceive these subphonemic differences, several suggestions have been made for how these might be denoted and incorporated into standard clinical assessments. Both Stoel-Gammon (2001) and Edwards and Beckman (2008) have INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 10 suggested that one way to improve transcription reliability is to distinguish between intermediate productions and correct productions or clear substitutions. To the extent that listeners are able to perceive intermediate sounds, this would allow coding of subtle distinctions that may be lost using the standard transcription process. Another related possibility is to use a scaling procedure in which listeners do not simply judge whether a given sound is correct or incorrect, but how correct or incorrect it is. One such method uses Visual Analog Scaling [VAS]. VAS is often used in the assessment of complex, multidimensional percepts, such as the perception of pain in clinical medical settings. There is considerable research on the reliability and validity of this measure in the pain literature (e.g., Price, McGrath, Rafii & Buckingham, 1983; Bijur, Silver, & Gallagher, 2001; Gallagher, Liebman & Bijur, 2001). It is also used widely in the study of voice disorders and is part of one standardized voice assessment, the CAPE-V (Kempster, Gerratt, Verdolini Abbott, BarkmeierKraemer, & Hillman, 2009). VAS has also been used to study adults' perception of children's speech. In one such procedure, Urberg-Carlson, Kaiser, and Munson (2008) used a horizontal line with endpoints representing two contrasting phonemes: /s/ and /ʃ/. Listeners were asked to use a mouse to click a point on the line where they perceived that a given sound production fell along the continuum. Urberg-Carlson et al. found a strong correlation between the click location for individual stimuli and the centroid frequency of the stimuli, where centroid frequency is the principle acoustic parameter discriminating the endpoints /s/ and /ʃ/. The basic principle behind both of these techniques (i.e., using VAS to measure phoneme production or expanding transcription to include symbols for intermediate productions) is that using a less categorical, non-binary measure in auditory perceptual tasks may help to elicit judgments that are more consistent with the gradient nature of speech development. However, INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 11 few studies have systematically examined the effects of these suggestions on transcription validity and reliability. This study aims to fill this gap by examining the utility of VAS ratings to assess children's productions of /s/ and /θ/. The role of listener bias in perceptual judgments There exists one final concern with transcription that has not yet been addressed: the potential for bias in perceptual judgments. This is a concern for transcription in general, given the ample evidence that speech-sound perception can be biased by numerous factors. In other words, a constant auditory signal may be perceived differently by the same listener solely based on his or her expectations regarding the talker. These listener expectations may stem from a variety of different sources of information about a talker. Johnson, Strand, and D'Imperio (1999) found that expectations regarding the gender of a talker influenced vowel perception, such that a given vowel sound was perceived differently depending on whether listeners believed the talker was a man or a woman. Sociolinguistic expectations related to regional dialect have also been shown to affect speech perception. Niedzielski (1999) found Detroit listeners perceived the diphthong /aʊ/ differently depended on whether they believed the talker was from Detroit or Canada. Similarly, it has been demonstrated that listeners from New Zealand perceive diphthongs differently if they believe the talker is from New Zealand versus from Australia (Hay, Nolan, & Drager, 2006; Drager & Hay, 2006). Hay, Warren, and Drager (2006) also found that expectations regarding a talkers' age and social class impacted listeners' perceptions that talkers' diphthongs. Additionally, the classic McGurk effect shows that visually-based listener expectations affect how adults perceive speech sounds (McGurk & McDonald, 1976). Listener expectations have even been shown to affect whether a non-speech acoustic signal is heard as speech (Remez, Rubin, Pisoni & Carrell, 1981). Studies using adult speech and INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 12 synthetic speech stimuli suggest that listener expectations play a larger role in the perception of ambiguous stimuli than in unambiguous stimuli (Diehl, Lotto, & Holt, 2004; Samuel, 2001). The effect of expectations may be particularly strong in the perception of children's speech, because children's speech is more variable than that of adults (Baum & McNutt, 1990; Kong, 2009) and may contain speech sound errors (e.g., Ingram, 1976) including these intermediate (i.e., more ambiguous) productions. However, very little research has directly investigated the influence of listener expectations on the perception of children's speech, and the results are equivocal. For example, Meitus, Ringel, House, and Hotschkiss (1973) found no evidence that listener expectations based on case history information influenced adults' perception of children's speech. On the other hand, Wilson and Gasek (1975) reported that experienced speech-language pathologists' perceived children's speech differently depending on whether they were biased to think a child had a "mild-moderate" articulation disorder versus a "moderate-severe" disorder. Thus, a second focus of this study is to examine individual listeners' susceptibility to bias when perceiving children's /s/ and /θ/ productions. Aims of the current study The present study aimed to address these concerns with transcription by exploring how adult listeners perceive the speech of young children, with a special focus on their perception of intermediate productions. Specifically, we looked at how adults perceive children's correct productions of /s/ and /θ/, clear substitutions ([s] for /θ/ and [θ] for /s/), and intermediate productions (tokens perceived to be neither clearly /s/ nor clearly /θ/). The /s/ and /θ/ sounds were chosen for several reasons. First, both are typically mastered relatively late in development (e.g., Sander, 1972; Fudala & Reynolds, 1986; Smit et al., 1990). Additionally, children have often been observed to produce /θ/-like sound substitutions for /s/ (McGlone & Proffitt, 1973; INTERMEDIATE PRODUCTIONS AND LISTENER EXPECTATIONS 13 Smit, et al., 1990). Indeed, in the speech of 100 English-speaking children recorded for a larger project (Edwards & Beckman, 2008), numerous cases of frontal misarticulations of /θ/-like sounds for /s/ were observed, and these errors were the predominant errors of /s/ in the Smit et al. (1990) study. By including correct productions, clear substitutions, and intermediate productions, we essentially created a natural "continuum" of speech sounds ranging from /s/ to
منابع مشابه
The Role of Intermediate Productions and Listener Expectations on the Perception of Children’s Speech
متن کامل
Acoustic differences, listener expectations, and the perceptual accommodation of talker variability.
Two talkers' productions of the same phoneme may be quite different acoustically, whereas their productions of different speech sounds may be virtually identical. Despite this lack of invariance in the relationship between the speech signal and linguistic categories, listeners experience phonetic constancy across a wide range of talkers, speaking styles, linguistic contexts, and acoustic enviro...
متن کاملInternational Clinical Phonetics and Linguistics Association 2016
Listener bias is a well-documented phenomenon in linguistics and sociolinguistics research. Listeners have demonstrated bias in assigning personality traits such as intelligence, selfconfidence and likeability to speakers from different language backgrounds (Delamere, 1996). A study by Derwing and Munro (1997), demonstrated that intelligibility scores assigned to nonnative speech were better wh...
متن کاملListener-speaker perceived distance predicts the degree of motor contribution to speech perception.
Listening speech sounds activates motor and premotor areas in addition to temporal and parietal brain regions. These activations are somatotopically localized according to the effectors recruited in the production of particular phonemes. Previous work demonstrated that transcranial magnetic stimulation (TMS) of speech motor centers somatotopically altered speech perception, suggesting a role fo...
متن کاملPerception of clear fricatives by normal-hearing and simulated hearing-impaired listeners.
Speakers may adapt the phonetic details of their productions when they anticipate perceptual difficulty or comprehension failure on the part of a listener. Previous research suggests that a speaking style known as clear speech is more intelligible overall than casual, conversational speech for a variety of listener populations. However, it is unknown whether clear speech improves the intelligib...
متن کامل